53 research outputs found
Programmable Logic Arrays
Programmable logic arrays (PLAs) are traditional digital electronic devices.
A PLA is a simple programmable logic device (SPLD) used to implement
combinational logic circuits. A PLA has a set of programmable AND gates, which
link to a set of programmable OR gates to produce an output. The AND-OR layout
of a PLA allows for implementing logic functions that are in a sum-of-products
form. PLAs are available in the market in different types. PLAs could be stand
alone chips, or parts of bigger processing systems. Stand alone PLAs are
available as mask programmable (MPLAs) and field programmable (FPLAs) devices.
The attractions of PLAs that brought them to mainstream engineers include their
simplicity, relatively small circuit area, predictable propagation delay, and
ease of development. The powerful-but-simple property brought PLAs to rapid
prototyping, synthesis, design optimization techniques, embedded systems,
traditional computer systems, hybrid high-performance computing systems, etc.
Indeed, there has been renewable interests in working with the simple AND-to-OR
PLAs.Comment: 19 pages, 18 figures. arXiv admin note: text overlap with
arXiv:1905.02075, arXiv:1905.0207
Higher-Level Hardware Synthesis of The KASUMI Algorithm
Programmable Logic Devices (PLDs) continue to grow in size and currently
contain several millions of gates. At the same time, research effort is going
into higher-level hardware synthesis methodologies for reconfigurable computing
that can exploit PLD technology. In this paper, we explore the effectiveness
and extend one such formal methodology in the design of massively parallel
algorithms. We take a step-wise refinement approach to the development of
correct reconfigurable hardware circuits from formal specifications. A
functional programming notation is used for specifying algorithms and for
reasoning about them. The specifications are realised through the use of a
combination of function decomposition strategies, data refinement techniques,
and off-the-shelf refinements based upon higher-order functions. The
off-the-shelf refinements are inspired by the operators of Communicating
Sequential Processes (CSP) and map easily to programs in Handel-C (a hardware
description language). The Handel-C descriptions are directly compiled into
reconfigurable hardware. The practical realisation of this methodology is
evidenced by a case studying the third generation mobile communication security
algorithms. The investigated algorithm is the KASUM} block cipher. In this
paper, we obtain several hardware implementations with different performance
characteristics by applying different refinements to the algorithm. The
developed designs are compiled and tested under Celoxica's RC-1000
reconfigurable computer with its 2 million gates Virtex-E FPGA. Performance
analysis and evaluation of these implementations are included
High Performance Reconfigurable Computing Systems
The rapid progress and advancement in electronic chips technology provide a
variety of new implementation options for system engineers. The choice varies
between the flexible programs running on a general-purpose processor (GPP) and
the fixed hardware implementation using an application specific integrated
circuit (ASIC). Many other implementation options present, for instance, a
system with a RISC processor and a DSP core. Other options include graphics
processors and microcontrollers. Specialist processors certainly improve
performance over general-purpose ones, but this comes as a quid pro quo for
flexibility. Combining the flexibility of GPPs and the high performance of
ASICs leads to the introduction of reconfigurable computing (RC) as a new
implementation option with a balance between versatility and speed. The focus
of this chapter is on introducing reconfigurable computers as modern super
computing architectures. The chapter also investigates the main reasons behind
the current advancement in the development of RC-systems. Furthermore, a
technical survey of various RC-systems is included laying common grounds for
comparisons. In addition, this chapter mainly presents case studies implemented
under the MorphoSys RC-system. The selected case studies belong to different
areas of application, such as, computer graphics and information coding.
Parallel versions of the studied algorithms are developed to match the
topologies supported by the MorphoSys. Performance evaluation and results
analyses are included for implementations with different characteristics.Comment: 53 pages, 14 tables, 15 figure
High-level Synthesis
Hardware synthesis is a general term used to refer to the processes involved
in automatically generating a hardware design from its specification.
High-level synthesis (HLS) could be defined as the translation from a
behavioral description of the intended hardware circuit into a structural
description similar to the compilation of programming languages (such as C and
Pascal into assembly language. The chained synthesis tasks at each level of the
design process include system synthesis, register-transfer synthesis, logic
synthesis, and circuit synthesis. The development of hardware solutions for
complex applications is no more a complicated task with the emergence of
various HLS tools. Many areas of application have benefited from the modern
advances in hardware design, such as automotive and aerospace industries,
computer graphics, signal and image processing, security, complex simulations
like molecular modeling, and DND matching. The field of HLS is continuing its
rapid growth to facilitate the creation of hardware and to blur more and more
the border separating the processes of designing hardware and software.Comment: 19 Pages, 16 Figures. arXiv admin note: text overlap with
arXiv:1905.02075, arXiv:1905.0207
Performance Analysis of Linear Algebraic Functions using Reconfigurable Computing
This paper introduces a new mapping of geometrical transformation on the
MorphoSys (M1) reconfigurable computing (RC) system. New mapping techniques for
some linear algebraic functions are recalled. A new mapping for geometrical
transformation operations is introduced and their performance on the M1 system
is evaluated. The translation and scaling transformation addressed in this
mapping employ some vector-vector and vector-scalar operations [6-7]. A
performance analysis study of the M1 RC system is also presented to evaluate
the efficiency of the algorithm execution. Numerical examples were simulated to
validate our results, using the MorphoSys mULATE program, which emulates M1
operations.Comment: 22 pages, 17 figures, 5 tables. arXiv admin note: substantial text
overlap with arXiv:1904.04953; text overlap with arXiv:1904.0619
An Analysis Framework for Hardware and Software Implementations with Applications from Cryptography
With the richness of present-day hardware architectures, tightening the
synergy between hardware and software has attracted a great attention. The
interest in unified approaches paved the way for newborn frameworks that target
hardware and software co-design. This paper confirms that a unified statistical
framework can successfully classify algorithms based on a combination of the
heterogeneous characteristics of their hardware and software implementations.
The proposed framework produces customizable indicators for any hybridization
of processing systems and can be contextualized for any area of application.
The framework is used to develop the Lightness Indicator System (LIS) as a
case-study that targets a set of cryptographic algorithms that are known in the
literature to be tiny and light. The LIS targets state-of-the-art multi-core
processors and high-end Field Programmable Gate Arrays (FPGAs). The presented
work includes a generic benchmark model that aids the clear presentation of the
framework and extensive performance analysis and evaluation.Comment: 20 Pages, 6 Figures, 5 Table
Analysis of Pipelined KATAN Ciphers under Handle-C for FPGAs
Embedded Systems are everywhere from the smartphones we hold in our hands to
the satellites that hover around the earth. These embedded systems are being
increasingly integrated into our personal and commercial infrastructures. More
than 98% of all processors are implanted and used in embedded systems rather
than traditional computers. As a result, security in embedded systems now more
than ever has become a major concern. Since embedded systems are designed to be
low-cost, fast and real-time, it would be appropriate to use tiny, lightweight
and highly secure cryptographic algorithms. KATAN and KATANTAN family of
light-weight block ciphers are promising cryptographic options. In this paper,
a sequential hardware design is developed under Handel-C. Taking a step
further, Handel-C's parallel construct is taken advantage of to develop a
parallel-pipelined hybrid implementation. Both sequential and
parallel-pipelined implementations are tested under Altera Quartus to implement
and analyze hardware designs in conjunction with DK Design Suite's Handel-C
compiler. The developed designs are mapped to Altera's Stratix II that is one
of the industry's highest bandwidth and density FPGAs. The results confirm that
using Handel-C can provide faster implementations. The obtained results are
promising and show better performance when compared with similar
implementations-specifically the developed parallel-pipelined processor.Comment: 6 pages, 3 figures, 6 table
Cyclic Coding Algorithms under MorphoSys Reconfigurable Computing System
This paper introduces reconfigurable computing (RC) and specifically chooses
one of the prototypes in this field, MorphoSys (M1) [1 - 5]. The paper
addresses the results obtained when using RC in mapping algorithms pertaining
to digital coding in relation to previous research [6 - 10]. The chosen
algorithms relate to cyclic coding techniques, namely the CCITT CRC-16 and the
CRC-16. A performance analysis study of the M1 RC system is also presented to
evaluate the efficiency of the algorithm execution on the M1 system. For
comparison purposes, three other systems where used to map the same algorithms
showing the advantages and disadvantages of each compared with the M1 system.
The algorithms were run on the 8x8 RC (reconfigurable) array of the M1
(MorphoSys) system; numerical examples were simulated to validate our results,
using the MorphoSys mULATE program, which simulates MorphoSys operations.Comment: 30 Pages, 11 figures, 9 tables. arXiv admin note: substantial text
overlap with arXiv:1904.0495
Parallel Algorithms Development for Programmable Devices with Application from Cryptography
Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), have
been witnessing a considerable increase in density. State-of-the-art FPGAs are
complex hybrid devices that contain up to several millions of gates. Recently,
research effort has been going into higher-level parallelization and hardware
synthesis methodologies that can exploit such a programmable technology. In
this paper, we explore the effectiveness of one such formal methodology in the
design of parallel versions of the Serpent cryptographic algorithm. The
suggested methodology adopts a functional programming notation for specifying
algorithms and for reasoning about them. The specifications are realized
through the use of a combination of function decomposition strategies, data
refinement techniques, and off-the-shelf refinements based upon higher-order
functions. The refinements are inspired by the operators of Communicating
Sequential Processes (CSP) and map easily to programs in Handel-C (a hardware
description language). In the presented research, we obtain several parallel
Serpent implementations with different performance characteristics. The
developed designs are tested under Celoxica's RC-1000 reconfigurable computer
with its 2 million gates Virtex-E FPGA. Performance analysis and evaluation of
these implementations are included.Comment: 47 Pages, 16 Figures, 4 Tables. arXiv admin note: text overlap with
arXiv:1904.0375
Multigrid Solvers in Reconfigurable Hardware
The problem of finding the solution of Partial Differential Equations (PDEs)
plays a central role in modeling real world problems. Over the past years,
Multigrid solvers have showed their robustness over other techniques, due to
its high convergence rate which is independent of the problem size. For this
reason, many attempts for exploiting the inherent parallelism of Multigrid have
been made to achieve the desired efficiency and scalability of the method. Yet,
most efforts fail in this respect due to many factors (time, resources)
governed by software implementations. In this paper, we present a hardware
implementation of the V-cycle Multigrid method for finding the solution of a
2D-Poisson equation. We use Handel-C to implement our hardware design, which we
map onto available Field Programmable Gate Arrays (FPGAs). We analyze the
implementation performance using the FPGA vendor's tools. We demonstrate the
robustness of Multigrid over other iterative solvers, such as Jacobi and
Successive Over Relaxation (SOR), in both hardware and software. We compare our
findings with a C++ version of each algorithm. The obtained results show better
performance when compared to existing software versions.Comment: 24 Pages, 11 Figures, 10 Table
- …